Overview

Dataset statistics

Number of variables25
Number of observations844338
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory161.0 MiB
Average record size in memory200.0 B

Variable types

Numeric14
Categorical9
DateTime2

Warnings

df_index is highly correlated with storeHigh correlation
store is highly correlated with df_indexHigh correlation
competition_open_since_year is highly correlated with competition_time_monthHigh correlation
day_of_week is highly correlated with dayofweekHigh correlation
month is highly correlated with weekofyearHigh correlation
weekofyear is highly correlated with monthHigh correlation
dayofweek is highly correlated with day_of_weekHigh correlation
competition_time_month is highly correlated with competition_open_since_yearHigh correlation
df_index has unique values Unique
dayofweek has 137557 (16.3%) zeros Zeros
competition_time_month has 268025 (31.7%) zeros Zeros
promo_time_week has 421646 (49.9%) zeros Zeros

Reproduction

Analysis started2021-08-08 14:49:50.011949
Analysis finished2021-08-08 14:53:51.891863
Duration4 minutes and 1.88 second
Software versionpandas-profiling v2.12.0
Download configurationconfig.yaml

Variables

df_index
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct844338
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean508596.3788
Minimum0
Maximum1017207
Zeros1
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size6.4 MiB
2021-08-08T11:53:52.644775image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile51060.85
Q1254520.25
median508254.5
Q3762678.75
95-th percentile966495.3
Maximum1017207
Range1017207
Interquartile range (IQR)508158.5

Descriptive statistics

Standard deviation293481.2635
Coefficient of variation (CV)0.5770415907
Kurtosis-1.198309067
Mean508596.3788
Median Absolute Deviation (MAD)254062.5
Skewness0.001379351741
Sum4.294272493 × 1011
Variance8.6131252 × 1010
MonotocityStrictly increasing
2021-08-08T11:53:52.954507image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
< 0.1%
2923881
 
< 0.1%
3661041
 
< 0.1%
3620101
 
< 0.1%
3640591
 
< 0.1%
3743001
 
< 0.1%
3763491
 
< 0.1%
3702061
 
< 0.1%
2841921
 
< 0.1%
2862411
 
< 0.1%
Other values (844328)844328
> 99.9%
ValueCountFrequency (%)
01
< 0.1%
11
< 0.1%
21
< 0.1%
31
< 0.1%
41
< 0.1%
ValueCountFrequency (%)
10172071
< 0.1%
10172061
< 0.1%
10172051
< 0.1%
10172041
< 0.1%
10172021
< 0.1%

store
Real number (ℝ≥0)

HIGH CORRELATION

Distinct1115
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean558.4213739
Minimum1
Maximum1115
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.4 MiB
2021-08-08T11:53:53.235503image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile56
Q1280
median558
Q3837
95-th percentile1060
Maximum1115
Range1114
Interquartile range (IQR)557

Descriptive statistics

Standard deviation321.7308614
Coefficient of variation (CV)0.5761435297
Kurtosis-1.198836421
Mean558.4213739
Median Absolute Deviation (MAD)278
Skewness0.0004258853753
Sum471496386
Variance103510.7472
MonotocityIncreasing
2021-08-08T11:53:53.510919image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
335942
 
0.1%
733942
 
0.1%
682942
 
0.1%
562942
 
0.1%
1097942
 
0.1%
262942
 
0.1%
769942
 
0.1%
494942
 
0.1%
423942
 
0.1%
85942
 
0.1%
Other values (1105)834918
98.9%
ValueCountFrequency (%)
1781
0.1%
2784
0.1%
3779
0.1%
4784
0.1%
5779
0.1%
ValueCountFrequency (%)
1115781
0.1%
1114784
0.1%
1113784
0.1%
1112779
0.1%
1111779
0.1%

store_type
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.4 MiB
a
457042 
d
258768 
c
112968 
b
 
15560

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters844338
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowc
2nd rowc
3rd rowc
4th rowc
5th rowc
ValueCountFrequency (%)
a457042
54.1%
d258768
30.6%
c112968
 
13.4%
b15560
 
1.8%
2021-08-08T11:53:54.065238image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Histogram of lengths of the category
2021-08-08T11:53:54.226314image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
ValueCountFrequency (%)
a457042
54.1%
d258768
30.6%
c112968
 
13.4%
b15560
 
1.8%

Most occurring characters

ValueCountFrequency (%)
a457042
54.1%
d258768
30.6%
c112968
 
13.4%
b15560
 
1.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter844338
100.0%

Most frequent character per category

ValueCountFrequency (%)
a457042
54.1%
d258768
30.6%
c112968
 
13.4%
b15560
 
1.8%

Most occurring scripts

ValueCountFrequency (%)
Latin844338
100.0%

Most frequent character per script

ValueCountFrequency (%)
a457042
54.1%
d258768
30.6%
c112968
 
13.4%
b15560
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII844338
100.0%

Most frequent character per block

ValueCountFrequency (%)
a457042
54.1%
d258768
30.6%
c112968
 
13.4%
b15560
 
1.8%

assortment
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.4 MiB
basic
444875 
extended
391254 
extra
 
8209

Length

Max length8
Median length5
Mean length6.390156549
Min length5

Characters and Unicode

Total characters5395452
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowbasic
2nd rowbasic
3rd rowbasic
4th rowbasic
5th rowbasic
ValueCountFrequency (%)
basic444875
52.7%
extended391254
46.3%
extra8209
 
1.0%
2021-08-08T11:53:54.598748image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Histogram of lengths of the category
2021-08-08T11:53:54.780180image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
ValueCountFrequency (%)
basic444875
52.7%
extended391254
46.3%
extra8209
 
1.0%

Most occurring characters

ValueCountFrequency (%)
e1181971
21.9%
d782508
14.5%
a453084
 
8.4%
b444875
 
8.2%
s444875
 
8.2%
i444875
 
8.2%
c444875
 
8.2%
x399463
 
7.4%
t399463
 
7.4%
n391254
 
7.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter5395452
100.0%

Most frequent character per category

ValueCountFrequency (%)
e1181971
21.9%
d782508
14.5%
a453084
 
8.4%
b444875
 
8.2%
s444875
 
8.2%
i444875
 
8.2%
c444875
 
8.2%
x399463
 
7.4%
t399463
 
7.4%
n391254
 
7.3%

Most occurring scripts

ValueCountFrequency (%)
Latin5395452
100.0%

Most frequent character per script

ValueCountFrequency (%)
e1181971
21.9%
d782508
14.5%
a453084
 
8.4%
b444875
 
8.2%
s444875
 
8.2%
i444875
 
8.2%
c444875
 
8.2%
x399463
 
7.4%
t399463
 
7.4%
n391254
 
7.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII5395452
100.0%

Most frequent character per block

ValueCountFrequency (%)
e1181971
21.9%
d782508
14.5%
a453084
 
8.4%
b444875
 
8.2%
s444875
 
8.2%
i444875
 
8.2%
c444875
 
8.2%
x399463
 
7.4%
t399463
 
7.4%
n391254
 
7.3%

competition_distance
Real number (ℝ≥0)

Distinct655
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5961.827515
Minimum20
Maximum200000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.4 MiB
2021-08-08T11:53:54.973402image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/

Quantile statistics

Minimum20
5-th percentile130
Q1710
median2330
Q36910
95-th percentile20930
Maximum200000
Range199980
Interquartile range (IQR)6200

Descriptive statistics

Standard deviation12592.18111
Coefficient of variation (CV)2.112134421
Kurtosis145.2886585
Mean5961.827515
Median Absolute Deviation (MAD)1980
Skewness10.13490772
Sum5033797520
Variance158563025
MonotocityNot monotonic
2021-08-08T11:53:55.253471image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2509210
 
1.1%
506249
 
0.7%
3506239
 
0.7%
12006069
 
0.7%
1906066
 
0.7%
905607
 
0.7%
1805421
 
0.6%
3305294
 
0.6%
1505292
 
0.6%
1404684
 
0.6%
Other values (645)784207
92.9%
ValueCountFrequency (%)
20779
 
0.1%
303115
0.4%
403888
0.5%
506249
0.7%
602342
 
0.3%
ValueCountFrequency (%)
2000002186
0.3%
75860887
0.1%
58260885
0.1%
48330784
 
0.1%
46590784
 
0.1%

competition_open_since_month
Real number (ℝ≥0)

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.787355301
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.4 MiB
2021-08-08T11:53:55.551802image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q14
median7
Q310
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.309916685
Coefficient of variation (CV)0.4876592632
Kurtosis-1.231875281
Mean6.787355301
Median Absolute Deviation (MAD)3
Skewness-0.04845105686
Sum5730822
Variance10.95554846
MonotocityNot monotonic
2021-08-08T11:53:55.763571image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
9112179
13.3%
498204
11.6%
1186359
10.2%
380052
9.5%
776226
9.0%
1263968
7.6%
663913
7.6%
1063216
7.5%
558271
6.9%
256895
6.7%
Other values (2)85055
10.1%
ValueCountFrequency (%)
137733
 
4.5%
256895
6.7%
380052
9.5%
498204
11.6%
558271
6.9%
ValueCountFrequency (%)
1263968
7.6%
1186359
10.2%
1063216
7.5%
9112179
13.3%
847322
5.6%

competition_open_since_year
Real number (ℝ≥0)

HIGH CORRELATION

Distinct23
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2010.331102
Minimum1900
Maximum2015
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.4 MiB
2021-08-08T11:53:55.990852image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/

Quantile statistics

Minimum1900
5-th percentile2002
Q12008
median2012
Q32014
95-th percentile2015
Maximum2015
Range115
Interquartile range (IQR)6

Descriptive statistics

Standard deviation5.502627816
Coefficient of variation (CV)0.002737174891
Kurtosis123.9030779
Mean2010.331102
Median Absolute Deviation (MAD)2
Skewness-7.217322842
Sum1697398942
Variance30.27891288
MonotocityNot monotonic
2021-08-08T11:53:56.270966image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Histogram with fixed size bins (bins=23)
ValueCountFrequency (%)
2013170465
20.2%
2014151774
18.0%
201591118
10.8%
201261716
 
7.3%
200546703
 
5.5%
201042715
 
5.1%
201141363
 
4.9%
200940711
 
4.8%
200840195
 
4.8%
200736125
 
4.3%
Other values (13)121453
14.4%
ValueCountFrequency (%)
1900622
 
0.1%
1961779
 
0.1%
19903885
0.5%
19941552
 
0.2%
19951404
 
0.2%
ValueCountFrequency (%)
201591118
10.8%
2014151774
18.0%
2013170465
20.2%
201261716
 
7.3%
201141363
 
4.9%

promo2
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.4 MiB
0
423292 
1
421046 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters844338
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0
ValueCountFrequency (%)
0423292
50.1%
1421046
49.9%
2021-08-08T11:53:56.757261image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Histogram of lengths of the category
2021-08-08T11:53:56.964948image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
ValueCountFrequency (%)
0423292
50.1%
1421046
49.9%

Most occurring characters

ValueCountFrequency (%)
0423292
50.1%
1421046
49.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number844338
100.0%

Most frequent character per category

ValueCountFrequency (%)
0423292
50.1%
1421046
49.9%

Most occurring scripts

ValueCountFrequency (%)
Common844338
100.0%

Most frequent character per script

ValueCountFrequency (%)
0423292
50.1%
1421046
49.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII844338
100.0%

Most frequent character per block

ValueCountFrequency (%)
0423292
50.1%
1421046
49.9%

promo2_since_week
Real number (ℝ≥0)

Distinct52
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean23.62908338
Minimum1
Maximum52
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.4 MiB
2021-08-08T11:53:57.125680image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q112
median22
Q337
95-th percentile47
Maximum52
Range51
Interquartile range (IQR)25

Descriptive statistics

Standard deviation14.28831488
Coefficient of variation (CV)0.6046918813
Kurtosis-1.194814545
Mean23.62908338
Median Absolute Deviation (MAD)12
Skewness0.1703986475
Sum19950933
Variance204.1559421
MonotocityNot monotonic
2021-08-08T11:53:57.474713image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1469320
 
8.2%
4056919
 
6.7%
3142369
 
5.0%
1042002
 
5.0%
539506
 
4.7%
134479
 
4.1%
1333878
 
4.0%
3733528
 
4.0%
2232208
 
3.8%
1830709
 
3.6%
Other values (42)429420
50.9%
ValueCountFrequency (%)
134479
4.1%
29644
 
1.1%
39784
 
1.2%
49778
 
1.2%
539506
4.7%
ValueCountFrequency (%)
524342
 
0.5%
516424
0.8%
507188
0.9%
497030
0.8%
4813442
1.6%

promo2_since_year
Real number (ℝ≥0)

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2012.797915
Minimum2009
Maximum2015
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.4 MiB
2021-08-08T11:53:57.763455image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/

Quantile statistics

Minimum2009
5-th percentile2009
Q12012
median2013
Q32014
95-th percentile2015
Maximum2015
Range6
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.660124714
Coefficient of variation (CV)0.0008247845956
Kurtosis-0.1979112532
Mean2012.797915
Median Absolute Deviation (MAD)1
Skewness-0.788295691
Sum1699481766
Variance2.756014067
MonotocityNot monotonic
2021-08-08T11:53:57.980364image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
2013257197
30.5%
2014227630
27.0%
2015103528
12.3%
201195035
 
11.3%
201260712
 
7.2%
200953824
 
6.4%
201046412
 
5.5%
ValueCountFrequency (%)
200953824
 
6.4%
201046412
 
5.5%
201195035
 
11.3%
201260712
 
7.2%
2013257197
30.5%
ValueCountFrequency (%)
2015103528
12.3%
2014227630
27.0%
2013257197
30.5%
201260712
 
7.2%
201195035
 
11.3%

day_of_week
Real number (ℝ≥0)

HIGH CORRELATION

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.52034967
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.4 MiB
2021-08-08T11:53:58.239611image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q35
95-th percentile6
Maximum7
Range6
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.723712365
Coefficient of variation (CV)0.4896423725
Kurtosis-1.259347431
Mean3.52034967
Median Absolute Deviation (MAD)2
Skewness0.0193099865
Sum2972365
Variance2.971184316
MonotocityNot monotonic
2021-08-08T11:53:58.462263image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
6144052
17.1%
2143955
17.0%
3141922
16.8%
5138633
16.4%
1137557
16.3%
4134626
15.9%
73593
 
0.4%
ValueCountFrequency (%)
1137557
16.3%
2143955
17.0%
3141922
16.8%
4134626
15.9%
5138633
16.4%
ValueCountFrequency (%)
73593
 
0.4%
6144052
17.1%
5138633
16.4%
4134626
15.9%
3141922
16.8%

date
Date

Distinct942
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size6.4 MiB
Minimum2013-01-01 00:00:00
Maximum2015-07-31 00:00:00
2021-08-08T11:53:58.786758image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:59.118827image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

sales
Real number (ℝ≥0)

Distinct21733
Distinct (%)2.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6955.959134
Minimum46
Maximum41551
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.4 MiB
2021-08-08T11:53:59.510103image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/

Quantile statistics

Minimum46
5-th percentile3174
Q14859
median6369
Q38360
95-th percentile12668
Maximum41551
Range41505
Interquartile range (IQR)3501

Descriptive statistics

Standard deviation3103.815515
Coefficient of variation (CV)0.4462095673
Kurtosis4.854026586
Mean6955.959134
Median Absolute Deviation (MAD)1694
Skewness1.594928836
Sum5873180623
Variance9633670.754
MonotocityNot monotonic
2021-08-08T11:53:59.797514image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5674215
 
< 0.1%
5558197
 
< 0.1%
5483196
 
< 0.1%
6214195
 
< 0.1%
6049195
 
< 0.1%
5723194
 
< 0.1%
5449192
 
< 0.1%
5489191
 
< 0.1%
5140191
 
< 0.1%
5041190
 
< 0.1%
Other values (21723)842382
99.8%
ValueCountFrequency (%)
461
< 0.1%
1241
< 0.1%
1331
< 0.1%
2861
< 0.1%
2971
< 0.1%
ValueCountFrequency (%)
415511
< 0.1%
387221
< 0.1%
384841
< 0.1%
383671
< 0.1%
380371
< 0.1%

promo
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.4 MiB
0
467463 
1
376875 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters844338
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1
ValueCountFrequency (%)
0467463
55.4%
1376875
44.6%
2021-08-08T11:54:00.348566image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Histogram of lengths of the category
2021-08-08T11:54:00.524243image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
ValueCountFrequency (%)
0467463
55.4%
1376875
44.6%

Most occurring characters

ValueCountFrequency (%)
0467463
55.4%
1376875
44.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number844338
100.0%

Most frequent character per category

ValueCountFrequency (%)
0467463
55.4%
1376875
44.6%

Most occurring scripts

ValueCountFrequency (%)
Common844338
100.0%

Most frequent character per script

ValueCountFrequency (%)
0467463
55.4%
1376875
44.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII844338
100.0%

Most frequent character per block

ValueCountFrequency (%)
0467463
55.4%
1376875
44.6%

state_holiday
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.4 MiB
regular_day
843428 
public_holiday
 
694
easter_holiday
 
145
christmas
 
71

Length

Max length14
Median length11
Mean length11.00281285
Min length9

Characters and Unicode

Total characters9290093
Distinct characters18
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowregular_day
2nd rowregular_day
3rd rowregular_day
4th rowregular_day
5th rowregular_day
ValueCountFrequency (%)
regular_day843428
99.9%
public_holiday694
 
0.1%
easter_holiday145
 
< 0.1%
christmas71
 
< 0.1%
2021-08-08T11:54:00.863480image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Histogram of lengths of the category
2021-08-08T11:54:01.031714image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
ValueCountFrequency (%)
regular_day843428
99.9%
public_holiday694
 
0.1%
easter_holiday145
 
< 0.1%
christmas71
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
a1687911
18.2%
r1687072
18.2%
l844961
9.1%
_844267
9.1%
d844267
9.1%
y844267
9.1%
u844122
9.1%
e843718
9.1%
g843428
9.1%
i1604
 
< 0.1%
Other values (8)4476
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter8445826
90.9%
Connector Punctuation844267
 
9.1%

Most frequent character per category

ValueCountFrequency (%)
a1687911
20.0%
r1687072
20.0%
l844961
10.0%
d844267
10.0%
y844267
10.0%
u844122
10.0%
e843718
10.0%
g843428
10.0%
i1604
 
< 0.1%
h910
 
< 0.1%
Other values (7)3566
 
< 0.1%
ValueCountFrequency (%)
_844267
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin8445826
90.9%
Common844267
 
9.1%

Most frequent character per script

ValueCountFrequency (%)
a1687911
20.0%
r1687072
20.0%
l844961
10.0%
d844267
10.0%
y844267
10.0%
u844122
10.0%
e843718
10.0%
g843428
10.0%
i1604
 
< 0.1%
h910
 
< 0.1%
Other values (7)3566
 
< 0.1%
ValueCountFrequency (%)
_844267
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII9290093
100.0%

Most frequent character per block

ValueCountFrequency (%)
a1687911
18.2%
r1687072
18.2%
l844961
9.1%
_844267
9.1%
d844267
9.1%
y844267
9.1%
u844122
9.1%
e843718
9.1%
g843428
9.1%
i1604
 
< 0.1%
Other values (8)4476
 
< 0.1%

school_holiday
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.4 MiB
0
680893 
1
163445 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters844338
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1
ValueCountFrequency (%)
0680893
80.6%
1163445
 
19.4%
2021-08-08T11:54:01.375891image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Histogram of lengths of the category
2021-08-08T11:54:01.529093image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
ValueCountFrequency (%)
0680893
80.6%
1163445
 
19.4%

Most occurring characters

ValueCountFrequency (%)
0680893
80.6%
1163445
 
19.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number844338
100.0%

Most frequent character per category

ValueCountFrequency (%)
0680893
80.6%
1163445
 
19.4%

Most occurring scripts

ValueCountFrequency (%)
Common844338
100.0%

Most frequent character per script

ValueCountFrequency (%)
0680893
80.6%
1163445
 
19.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII844338
100.0%

Most frequent character per block

ValueCountFrequency (%)
0680893
80.6%
1163445
 
19.4%

is_promo
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.4 MiB
0
713533 
1
130805 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters844338
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0
ValueCountFrequency (%)
0713533
84.5%
1130805
 
15.5%
2021-08-08T11:54:01.871002image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Histogram of lengths of the category
2021-08-08T11:54:02.451568image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
ValueCountFrequency (%)
0713533
84.5%
1130805
 
15.5%

Most occurring characters

ValueCountFrequency (%)
0713533
84.5%
1130805
 
15.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number844338
100.0%

Most frequent character per category

ValueCountFrequency (%)
0713533
84.5%
1130805
 
15.5%

Most occurring scripts

ValueCountFrequency (%)
Common844338
100.0%

Most frequent character per script

ValueCountFrequency (%)
0713533
84.5%
1130805
 
15.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII844338
100.0%

Most frequent character per block

ValueCountFrequency (%)
0713533
84.5%
1130805
 
15.5%

year
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.4 MiB
2013
337924 
2014
310385 
2015
196029 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters3377352
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2015
2nd row2015
3rd row2015
4th row2015
5th row2015
ValueCountFrequency (%)
2013337924
40.0%
2014310385
36.8%
2015196029
23.2%
2021-08-08T11:54:02.921841image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Histogram of lengths of the category
2021-08-08T11:54:03.128296image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
ValueCountFrequency (%)
2013337924
40.0%
2014310385
36.8%
2015196029
23.2%

Most occurring characters

ValueCountFrequency (%)
2844338
25.0%
0844338
25.0%
1844338
25.0%
3337924
10.0%
4310385
 
9.2%
5196029
 
5.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number3377352
100.0%

Most frequent character per category

ValueCountFrequency (%)
2844338
25.0%
0844338
25.0%
1844338
25.0%
3337924
10.0%
4310385
 
9.2%
5196029
 
5.8%

Most occurring scripts

ValueCountFrequency (%)
Common3377352
100.0%

Most frequent character per script

ValueCountFrequency (%)
2844338
25.0%
0844338
25.0%
1844338
25.0%
3337924
10.0%
4310385
 
9.2%
5196029
 
5.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII3377352
100.0%

Most frequent character per block

ValueCountFrequency (%)
2844338
25.0%
0844338
25.0%
1844338
25.0%
3337924
10.0%
4310385
 
9.2%
5196029
 
5.8%

month
Real number (ℝ≥0)

HIGH CORRELATION

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.845773849
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.4 MiB
2021-08-08T11:54:03.291152image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median6
Q38
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.323959483
Coefficient of variation (CV)0.5686089762
Kurtosis-1.033189967
Mean5.845773849
Median Absolute Deviation (MAD)3
Skewness0.2577064283
Sum4935809
Variance11.04870665
MonotocityNot monotonic
2021-08-08T11:54:03.496677image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
186335
10.2%
385975
10.2%
785576
10.1%
682571
9.8%
481726
9.7%
280239
9.5%
580099
9.5%
854411
6.4%
1053291
6.3%
952321
6.2%
Other values (2)101794
12.1%
ValueCountFrequency (%)
186335
10.2%
280239
9.5%
385975
10.2%
481726
9.7%
580099
9.5%
ValueCountFrequency (%)
1250393
6.0%
1151401
6.1%
1053291
6.3%
952321
6.2%
854411
6.4%

weekofyear
Real number (ℝ≥0)

HIGH CORRELATION

Distinct52
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean23.64694589
Minimum1
Maximum52
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.4 MiB
2021-08-08T11:54:03.741909image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile3
Q111
median23
Q335
95-th percentile49
Maximum52
Range51
Interquartile range (IQR)24

Descriptive statistics

Standard deviation14.3899309
Coefficient of variation (CV)0.6085323225
Kurtosis-1.02576046
Mean23.64694589
Median Absolute Deviation (MAD)12
Skewness0.2622868966
Sum19966015
Variance207.0701114
MonotocityNot monotonic
2021-08-08T11:54:04.043109image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2620119
 
2.4%
1220098
 
2.4%
920093
 
2.4%
1120079
 
2.4%
620066
 
2.4%
520063
 
2.4%
820053
 
2.4%
1020051
 
2.4%
420044
 
2.4%
320040
 
2.4%
Other values (42)643632
76.2%
ValueCountFrequency (%)
115161
1.8%
219448
2.3%
320040
2.4%
420044
2.4%
520063
2.4%
ValueCountFrequency (%)
528319
1.0%
5112355
1.5%
5012333
1.5%
4912334
1.5%
4812334
1.5%

dayofweek
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.52034967
Minimum0
Maximum6
Zeros137557
Zeros (%)16.3%
Negative0
Negative (%)0.0%
Memory size6.4 MiB
2021-08-08T11:54:04.272560image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q34
95-th percentile5
Maximum6
Range6
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.723712365
Coefficient of variation (CV)0.683917944
Kurtosis-1.259347431
Mean2.52034967
Median Absolute Deviation (MAD)2
Skewness0.0193099865
Sum2128027
Variance2.971184316
MonotocityNot monotonic
2021-08-08T11:54:04.468848image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
5144052
17.1%
1143955
17.0%
2141922
16.8%
4138633
16.4%
0137557
16.3%
3134626
15.9%
63593
 
0.4%
ValueCountFrequency (%)
0137557
16.3%
1143955
17.0%
2141922
16.8%
3134626
15.9%
4138633
16.4%
ValueCountFrequency (%)
63593
 
0.4%
5144052
17.1%
4138633
16.4%
3134626
15.9%
2141922
16.8%

seasons
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.4 MiB
spring
246607 
winter
237969 
summer
202687 
fall
157075 

Length

Max length6
Median length6
Mean length5.627933363
Min length4

Characters and Unicode

Total characters4751878
Distinct characters14
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowsummer
2nd rowsummer
3rd rowsummer
4th rowsummer
5th rowsummer
ValueCountFrequency (%)
spring246607
29.2%
winter237969
28.2%
summer202687
24.0%
fall157075
18.6%
2021-08-08T11:54:04.901210image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Histogram of lengths of the category
2021-08-08T11:54:05.048525image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
ValueCountFrequency (%)
spring246607
29.2%
winter237969
28.2%
summer202687
24.0%
fall157075
18.6%

Most occurring characters

ValueCountFrequency (%)
r687263
14.5%
i484576
10.2%
n484576
10.2%
s449294
9.5%
e440656
9.3%
m405374
8.5%
l314150
6.6%
p246607
 
5.2%
g246607
 
5.2%
w237969
 
5.0%
Other values (4)754806
15.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter4751878
100.0%

Most frequent character per category

ValueCountFrequency (%)
r687263
14.5%
i484576
10.2%
n484576
10.2%
s449294
9.5%
e440656
9.3%
m405374
8.5%
l314150
6.6%
p246607
 
5.2%
g246607
 
5.2%
w237969
 
5.0%
Other values (4)754806
15.9%

Most occurring scripts

ValueCountFrequency (%)
Latin4751878
100.0%

Most frequent character per script

ValueCountFrequency (%)
r687263
14.5%
i484576
10.2%
n484576
10.2%
s449294
9.5%
e440656
9.3%
m405374
8.5%
l314150
6.6%
p246607
 
5.2%
g246607
 
5.2%
w237969
 
5.0%
Other values (4)754806
15.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII4751878
100.0%

Most frequent character per block

ValueCountFrequency (%)
r687263
14.5%
i484576
10.2%
n484576
10.2%
s449294
9.5%
e440656
9.3%
m405374
8.5%
l314150
6.6%
p246607
 
5.2%
g246607
 
5.2%
w237969
 
5.0%
Other values (4)754806
15.9%

competition_time_month
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct376
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean41.67967212
Minimum-32
Maximum1407
Zeros268025
Zeros (%)31.7%
Negative70101
Negative (%)8.3%
Memory size6.4 MiB
2021-08-08T11:54:05.215639image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/

Quantile statistics

Minimum-32
5-th percentile-7
Q10
median16
Q374
95-th percentile145
Maximum1407
Range1439
Interquartile range (IQR)74

Descriptive statistics

Standard deviation66.8144125
Coefficient of variation (CV)1.60304554
Kurtosis126.8558883
Mean41.67967212
Median Absolute Deviation (MAD)17
Skewness7.338855632
Sum35191731
Variance4464.165718
MonotocityNot monotonic
2021-08-08T11:54:05.462088image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0268025
31.7%
19476
 
1.1%
75316
 
0.6%
55234
 
0.6%
45232
 
0.6%
65214
 
0.6%
95163
 
0.6%
85147
 
0.6%
105140
 
0.6%
115038
 
0.6%
Other values (366)525353
62.2%
ValueCountFrequency (%)
-3230
 
< 0.1%
-31147
 
< 0.1%
-30323
< 0.1%
-29445
0.1%
-28593
0.1%
ValueCountFrequency (%)
14075
 
< 0.1%
140625
< 0.1%
140525
< 0.1%
140423
< 0.1%
140323
< 0.1%
Distinct167
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size6.4 MiB
Minimum2009-07-27 00:00:00
Maximum2015-07-27 00:00:00
2021-08-08T11:54:05.725360image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:54:06.009317image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

promo_time_week
Real number (ℝ)

ZEROS

Distinct440
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean54.40069854
Minimum-126
Maximum313
Zeros421646
Zeros (%)49.9%
Negative57241
Negative (%)6.8%
Memory size6.4 MiB
2021-08-08T11:54:06.312171image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/

Quantile statistics

Minimum-126
5-th percentile-19
Q10
median0
Q3109
95-th percentile230
Maximum313
Range439
Interquartile range (IQR)109

Descriptive statistics

Standard deviation85.45755889
Coefficient of variation (CV)1.570890838
Kurtosis0.1129960976
Mean54.40069854
Median Absolute Deviation (MAD)1
Skewness1.103383487
Sum45932577
Variance7302.994372
MonotocityNot monotonic
2021-08-08T11:54:06.594472image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0421646
49.9%
523910
 
0.5%
981872
 
0.2%
1021847
 
0.2%
971830
 
0.2%
1031828
 
0.2%
1011778
 
0.2%
991777
 
0.2%
941770
 
0.2%
931764
 
0.2%
Other values (430)404316
47.9%
ValueCountFrequency (%)
-12612
< 0.1%
-12518
< 0.1%
-12418
< 0.1%
-12318
< 0.1%
-12218
< 0.1%
ValueCountFrequency (%)
31335
< 0.1%
31242
< 0.1%
31142
< 0.1%
31042
< 0.1%
30942
< 0.1%

Interactions

2021-08-08T11:52:26.694414image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:27.241030image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:27.739573image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:28.147992image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:28.592480image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:28.994387image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:29.426838image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:29.840062image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:30.221622image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:30.672976image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:35.535844image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:35.998770image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:36.396516image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:36.778646image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:37.160714image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:37.563123image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:37.930312image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:38.307672image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:38.696630image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:39.083113image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:39.464648image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:39.837444image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:40.204596image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:40.577068image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:40.949181image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:41.333912image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:41.713285image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:42.118442image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:42.523393image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:42.928727image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:43.337200image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:43.727852image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:44.128117image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:44.523198image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:44.903418image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:45.287169image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:45.696811image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:46.098676image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:46.502371image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:46.895001image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:47.248897image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:47.615795image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:47.989617image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:48.349364image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:48.707717image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:49.074502image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:49.438806image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:49.790878image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:50.138856image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:50.499062image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:51.067320image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:51.434047image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:51.783984image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:52.146836image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:52.523437image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:52.900640image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:53.267522image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:53.665586image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:54.058698image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:54.439673image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:54.803315image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:55.166118image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:55.536339image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:55.898137image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:56.273161image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:56.653303image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:57.009972image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:57.378403image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:57.751951image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:58.102472image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:58.465237image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:58.832689image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:59.194738image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:59.543913image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:52:59.899783image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:00.278771image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:00.637315image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:01.015682image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:01.367342image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:01.740738image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:02.115961image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:02.494693image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:02.859753image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:03.230790image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:03.601538image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:03.972936image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:04.332836image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:04.702677image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:05.101330image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:05.467900image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:05.840894image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:06.208322image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:06.574683image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:06.941406image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:07.318039image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:07.682241image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:08.049394image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:08.410210image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:08.797494image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:09.154929image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:09.775036image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:10.136902image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:10.502608image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:10.885364image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:11.244457image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:11.594289image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:11.946007image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:12.314505image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:12.662952image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:13.015551image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:13.358548image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:13.720104image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:14.072238image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:14.422239image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:14.777968image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:15.137944image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:15.495900image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:15.846429image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:16.198370image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:16.559749image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:16.929134image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:17.286603image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:17.644265image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:18.013818image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:18.386626image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:18.753277image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:19.104992image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:19.472904image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:19.838483image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:20.198151image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:20.548897image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:20.903052image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:21.259313image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:21.636165image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:21.992120image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:22.351311image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:22.708629image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:23.073746image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:23.439639image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:23.791696image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:24.147481image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:24.498852image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:24.865088image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:25.212980image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:25.612972image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:25.985616image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:26.373114image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:26.737653image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:27.107203image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:27.467273image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:27.843245image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:28.216263image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:28.578667image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:28.940746image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:29.307076image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:29.682943image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:30.037820image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:30.464294image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:30.835869image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:31.214472image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:31.583792image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:31.966152image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:32.659579image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:33.039109image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:33.417420image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:33.780285image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:34.143378image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:34.528463image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:34.897025image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:35.258948image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:35.621016image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:35.976515image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:36.337266image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:36.711577image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:37.109292image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:37.553963image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:37.989943image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:38.450522image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:38.876341image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:39.294529image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:39.746514image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
2021-08-08T11:53:40.198880image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/

Correlations

2021-08-08T11:54:06.872418image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-08-08T11:54:07.289174image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-08-08T11:54:07.712754image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-08-08T11:54:08.160347image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-08-08T11:54:08.573662image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-08-08T11:53:42.303250image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
A simple visualization of nullity by column.
2021-08-08T11:53:45.917981image/svg+xmlMatplotlib v3.4.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexstorestore_typeassortmentcompetition_distancecompetition_open_since_monthcompetition_open_since_yearpromo2promo2_since_weekpromo2_since_yearday_of_weekdatesalespromostate_holidayschool_holidayis_promoyearmonthweekofyeardayofweekseasonscompetition_time_monthpromo_sincepromo_time_week
001cbasic1270.0092008031201552015-07-3152631regular_day1020157314summer842015-07-270
111cbasic1270.0092008031201542015-07-3050201regular_day1020157313summer842015-07-270
221cbasic1270.0092008031201532015-07-2947821regular_day1020157312summer842015-07-270
331cbasic1270.0092008031201522015-07-2850111regular_day1020157311summer842015-07-270
441cbasic1270.0092008031201512015-07-2761021regular_day1020157310summer842015-07-270
561cbasic1270.0092008030201562015-07-2543640regular_day0020157305summer832015-07-200
671cbasic1270.0092008030201552015-07-2437060regular_day0020157304summer832015-07-200
781cbasic1270.0092008030201542015-07-2337690regular_day0020157303summer832015-07-200
891cbasic1270.0092008030201532015-07-2234640regular_day0020157302summer832015-07-200
9101cbasic1270.0092008030201522015-07-2135580regular_day0020157301summer832015-07-200

Last rows

df_indexstorestore_typeassortmentcompetition_distancecompetition_open_since_monthcompetition_open_since_yearpromo2promo2_since_weekpromo2_since_yearday_of_weekdatesalespromostate_holidayschool_holidayis_promoyearmonthweekofyeardayofweekseasonscompetition_time_monthpromo_sincepromo_time_week
84432810171971115dextended5350.0012013122201262013-01-1244970regular_day002013125winter02012-05-2133
84432910171981115dextended5350.0012013122201252013-01-1151421regular_day102013124winter02012-05-2133
84433010171991115dextended5350.0012013122201242013-01-1050071regular_day102013123winter02012-05-2133
84433110172001115dextended5350.0012013122201232013-01-0946491regular_day102013122winter02012-05-2133
84433210172011115dextended5350.0012013122201222013-01-0852431regular_day102013121winter02012-05-2133
84433310172021115dextended5350.0012013122201212013-01-0769051regular_day102013120winter02012-05-2133
84433410172041115dextended5350.0012013122201262013-01-0547710regular_day102013115winter02012-05-2132
84433510172051115dextended5350.0012013122201252013-01-0445400regular_day102013114winter02012-05-2132
84433610172061115dextended5350.0012013122201242013-01-0342970regular_day102013113winter02012-05-2132
84433710172071115dextended5350.0012013122201232013-01-0236970regular_day102013112winter02012-05-2132